Choosing Linguistics over Vision to Describe Images

نویسندگان

Ankush Gupta

Yashaswi Verma

C. V. Jawahar

چکیده

In this paper, we address the problem of automatically generating human-like descriptions for unseen images, given a collection of images and their corresponding human-generated descriptions. Previous attempts for this task mostly rely on visual clues and corpus statistics, but do not take much advantage of the semantic information inherent in the available image descriptions. Here, we present a generic method which benefits from all these three sources (i.e. visual clues, corpus statistics and available descriptions) simultaneously, and is capable of constructing novel descriptions. Our approach works on syntactically and linguistically motivated phrases extracted from the human descriptions. Experimental evaluations demonstrate that our formulation mostly generates lucid and semantically correct descriptions, and significantly outperforms the previous methods on automatic evaluation metrics. One of the significant advantages of our approach is that we can generate multiple interesting descriptions for an image. Unlike any previous work, we also test the applicability of our method on a large dataset containing complex images with rich descriptions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Similarity measurement for describe user images in social media

Online social networks like Instagram are places for communication. Also, these media produce rich metadata which are useful for further analysis in many fields including health and cognitive science. Many researchers are using these metadata like hashtags, images, etc. to detect patterns of user activities. However, there are several serious ambiguities like how much reliable are these informa...

متن کامل

Robot Motion Vision Part II: Implementation

The idea of Fixation introduced a direct method for general recovery of shape and motion from images without using either feature correspondence or optical flow [1,2]. There are some parameters which have important effects on the performance of fixation method. However, the theory of fixation does not say anything about the autonomous and correct choice of those parameters. This paper presents ...

متن کامل

Comparison of Different Targets Used in Augmented Reality Applications in Ubiquitous GIS

Drilling requires accurate information about locations of underground infrastructures or it can cause serious damages. Augmented Reality (AR) as a technology in Ubiquitous GIS (UBIGIS) can be used to visualize underground infrastructures on smartphones. Since smartphone’s sensors do not provide such accuracy, another approaches should be applied. Vision based computer vision systems are well kn...

متن کامل

Robot Motion Vision Pait I: Theory

A direct method called fixation is introduced for solving the general motion vision problem, arbitrary motion relative to an arbitrary environment. This method results in a linear constraint equation which explicitly expresses the rotational velocity in terms of the translational velocity. The combination of this constraint equation with the Brightness-Change Constraint Equation solves the gene...

متن کامل

YUV vs RGB-Choosing a Color Space for Human-Machine Interaction

This paper describes and compares two color spaces – YUV and RGB, taking into account possible human-computer interaction applications. Human perception-oriented properties are compared, including not only file size or bandwidth, but also subjective visibility of artifacts. 1700 tests on a group of 170 people were performed to describe the subjective quality of compressed YUV and RGB images. Th...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Choosing Linguistics over Vision to Describe Images

نویسندگان

چکیده

منابع مشابه

Similarity measurement for describe user images in social media

Robot Motion Vision Part II: Implementation

Comparison of Different Targets Used in Augmented Reality Applications in Ubiquitous GIS

Robot Motion Vision Pait I: Theory

YUV vs RGB-Choosing a Color Space for Human-Machine Interaction

عنوان ژورنال:

اشتراک گذاری